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Abstract A new algorithm for solving large-scale convex optimization problems with a 
separable objective function is proposed. The basic idea is to combine three techniques: 
^— i ■ Lagrangian dual decomposition, excessive gap and smoothing. The main advantage of this 

algorithm is that it dynamically updates the smoothness parameters which leads to numeri- 
cally robust performance. The convergence of the algorithm is proved under weak conditions 
imposed on the original problem. The rate of convergence is O(j), where k is the iteration 
counter. In the second part of the paper, the algorithm is coupled with a dual scheme to con- 
struct a switching variant of the dual decomposition. We discuss implementation issues and 
make a theoretical comparison. Numerical examples confirm the theoretical results. 
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1 Introduction 

> 

— , Large-scale convex optimization problems appear in many areas of science such as graph 

theory, networks, transportation, distributed model predictive control, distributed estimation 
and multistage stochastic optimization |8. 17 21 ,22,24,32 34,38,39 40.41 1. Solving large- 
scale optimization problems is still a challenge in many applications |9|. Over the years, 
| thanks to the development of parallel and distributed computer systems, the chances for 

solving large-scale problems have been increased. However, methods and algorithms for 
solving this type of problems are limited J2J9J. 

Convex minimization problems with a separable objective function form a class of prob- 
lems which is relevant in many applications. This class of problems is also known as sep- 
arable convex minimization problems, see, e.g. [2 ]. Without loss of generality, a separable 
convex optimization problem can be written in the form of a convex program with sepa- 
rable objective function and coupled linear constraints [2|. In addition, decoupling convex 
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constraints may also be considered. Mathematically, this problem can be formulated in the 
following form: 

M 

min <h(x) := V <t>i(xi) 
xeK" Yy ' g m ' 

s.t. Xi eXi (f=l,-.. ,M), (1) 

M 

where 0,- : K"' — > Kis convex, X,- e W' is a nonempty, closed convex set, A; 6 R mx "<, £> 6 R m 

for all / = 1, . . . ,M, and ni + «2 H h«Af =»• The last constraint is called coupling linear 

constraint. In principle, many convex problems can be written in this separable form by 
doubling the variables, i.e. introducing new variables x, and imposing the constraint x, = x. 
Despite the increased number of variables, treating convex problems by doubling variables 
may be useful in some situations, see, e.g. 11111121 . 

In the literature, numerous approaches have been proposed for solving problem ([T}. For 
example, (augmented) Lagrangian relaxation and subgradient methods of multipliers l2l 
[131133 391. Fenchel's dual decomposition 1151 . alternating linearization [6 12 23], proximal 
point-type methods (41171 1371 . interior point methods 1121 114 Ill25ll36l , mean value cross de- 
composition 1181 and partial inverse method 11351 among many others have been proposed. 
Our motivation in this paper is to develop a numerical algorithm for solving (0 which can 
be implemented in a parallel or distributed fashion. Note that the approach presented in the 
present paper is different from splitting methods and alternating methods considered in the 
literature, see, e.g. ll6l [T0l . 

One of the classical approaches for solving ([TJ is Lagrangian dual decomposition. The 
main idea of this approach is to solve the dual problem by means of a subgradient method. 
It has been recognized in practice that subgradient methods are usually slow and numeri- 
cally sensitive to the step size parameters. In the special case of a strongly convex objective 
function, the dual function is differentiable. Consequently, gradient schemes can be applied 
to solve the dual problem. 

Recently, Nesterov [29 ] developed smoothing techniques for solving nonsmooth convex 
optimization problems based on the fast gradient scheme which was introduced in his early 
work [28 1 . The fast gradient schemes have been used in numerous applications including 
image processing, compressed sensing, networks and system identification fT1l5irT4lfT 6 12 
HO- 

Exploiting Nesterov's idea in 1301 , Necoara and Suykens [27] applied a smoothing tech- 
nique to the dual problem in the framework of Lagrangian dual decomposition and then 
used the fast gradient scheme to maximize the smoothed function of the dual problem. This 
resulted in a new variant of dual decomposition algorithms for solving separable convex op- 
timization. The authors proved that the rate of convergence of their algorithm is O(j) which 
is much better than 0(-^) in the subgradient methods of multipliers, where k is the iteration 
counter. A main disadvantage of this scheme is that the smoothness parameter requires to 
be given a priori. Moreover, this parameter crucially depends on the given desired accuracy. 
Since the Lipschitz constant of the gradient of the objective function in the dual problem 
is inversely proportional to the smoothness parameter, the algorithm usually generates short 
steps towards a solution of the problem although the rate of convergence is O(j). 

To overcome this drawback, in this paper, we propose a new algorithm which combines 
three techniques : smoothing [30,31], excessive gap [ 3 1| and Lagrangian dual decomposition 
[2] techniques. Instead of fixing the smoothness parameters, we update them dynamically 
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at every iteration. Even though the worst case complexity is O(^), where e is a given toler- 
ance, the algorithms developed in this paper work better than the one in |27 | and are more 
numerically robust in practice. Note that the computational cost of the proposed algorithms 
remains almost the same as in the proximal-center-based decomposition algorithm proposed 
in 1127 1 Algorithm 3.2]. (Algorithm 3.2 in [27 1 requires to compute an additional dual step). 
This algorithm is called dual decomposition with primal update (Algorithm [TJ. Alterna- 
tively, we apply the switching strategy of 1311 to obtain a decomposition algorithm with 
switching primal-dual update for solving problem ([T}. This algorithm differs from the one 
in Oil at two points. First, the smoothness parameter is dynamically updated with an exact 
formula and second the proximal-based mappings are used to handle the nonsmoothness of 
the objective function. The second point is more significant since, in practice, estimating 
the Lipschitz constants is not an easy task even if the objective function is differentiable. 
The switching algorithm balances the disadvantage of the decomposition methods using the 
primal update (Algorithm[TJ and the dual update (Algorithm 3.2 1271 1. Proximal-based map- 
ping only plays a role of handling the nonsmoothness of the objective function. Therefore, 
the algorithms developed in this paper do not belong to any proximal-point algorithm class 
considered in the literature. Note also that all algorithms developed in this paper are first 
order methods which can be highly distributed. 

Contribution. The contribution of this paper is the following: 

1. We apply the Lagrangian relaxation, smoothing and excessive gap techniques to large- 
scale separable convex optimization problems which are not necessarily smooth. Note 
that the excessive gap condition that we use in this paper is different from the one in 
1311 , where not only the duality gap is measured but also the feasibility gap is used in 
the framework of constrained optimization, see (1231 . 

2. We propose two algorithms for solving general separable convex optimization problems. 
The first algorithm is new, while the second one is a new variant of the first algorithm 
proposed in Oil Algorithm 1] applied to Lagrangian dual decomposition. A special 
case of the algorithms, where the objective is strongly convex is considered. All the 
algorithms are highly parallelizable and distributed. 

3. The convergence of the algorithms is proved and the rate of convergence is estimated. 
Implementation details are discussed and a theoretical and numerical comparison is 
made. 

The rest of this paper is organized as follows. In the next section, we briefly describe the La- 
grangian dual decomposition method [2 | for separable convex optimization, the smoothing 
technique via prox-functions as well as excessive gap techniques [31|. We also provide sev- 
eral technical lemmas which will be used in the sequel. Section[3]presents a new algorithm 
called decomposition algorithm with primal update and estimates its worst-case complexity. 
Section [4] is a combination of the primal and the dual step update schemes which is called 
decomposition algorithm with primal-dual update. Section [5] is an application of the dual 
scheme d55 h to the strongly convex case of problem (|2j. We also discuss the implementation 
issues of the proposed algorithms and a theoretical comparison of Algorithms Q] and [2] in 
Section [6] Numerical examples are presented in Section [7] to examine the performance of 
the proposed algorithms and to compare different methods. 

Notation. Throughout the paper, we shall consider the Euclidean space MP endowed with 
an inner product x T y for x,y 6 R" and the norm ||jc|| := \/x T x. Associated with || ■ ||, || ■ 
||* := max{(-) r je : \\x\\ < 1} defines its dual norm. For simplicity of discussion, we use 
the Euclidean norm in the whole paper. Hence, || ■ ||* is equivalent to || • ||. The notation 
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x = (xi, . . . ,xm) represents a column vector in W, where x; is a subvector in W' , i = l,...,M 
and n\ + • — h % = «. 

2 Lagrangian dual decomposition and excessive gap smoothing technique 

A classical technique to address coupling constraints in optimization is Lagrangian relax- 
ation |2|. However, this technique often leads to a nonsmooth optimization problem in the 
dual form. To overcome this situation, we combine the Lagrangian dual decomposition and 
smoothing technique in [30 31] to obtain a smoothly approximate dual problem. 

For simplicity of discussion, we consider problem ([T} with M = 2. However, the methods 
presented in the next sections can be directly applied to the case M > 2 (see Section[6]l. The 
problem (0 can be rewritten as follows: 



where 0; : W — > K is convex, Xi is a nonempty, closed, convex and bounded subset in 
R"', Aj £ R" IX "< and b e R m (i = 1,2). Problem © is said to satisfy the Slater constraint 
qualification condition if ri(X) n {x = (xi,xz) \ Aijci +A2X2 = b] 0, where n{X) is the 
relative interior of the convex set X. Let us denote by X* the solution set of this problem. 
Throughout the paper, we assume that: 

A. 1 The solution set X* is nonempty and either the Slater qualification condition for prob- 
lem (|2]l holds or Xi is polyhedral. The function (j>i is proper, lower semicontinuous and convex 
inW, i= 1,2. 

Since X is convex and bounded, X* is also convex and bounded. Note that the objective 
function (j) is not necessarily smooth. For example, (j)(x) = \\x\\\ = Y!i=\ |*(i)|> which is is 
nonsmooth and separable. 

2. 1 Decomposition via Lagrangian relaxation 

Let us define the Lagrange function of the problem (|2} with respect to the coupling constraint 
A\x\ +A2X2 = b as: 



where y 6 R m is the multiplier associated with the coupling constraint A\x\ +A2X2 = b. A 
triplet (x\,x'2,y*) € X x W" is called a saddle point of L if: 




(2) 



L(x,y) := <j>i (xi) +^(jc 2 ) +y T (A 1 x i +A 2 x 2 - b), 



(3) 



L(x*,y) < L{x*,y*) < L(x,y*), \/x eX,\/ye R m . 
Next, we define the Lagrange dual function d of the problem l|2} as: 

d(y) :=mm{L(ij) := <j>i(xi) +<j) 2 {x 2 ) +y T {A l x l +A 2 x 2 -b)} . 

and then write down the dual problem of (|2}: 



(4) 



(5) 



d*:= 




(6) 
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Let A = [Ai ,A2]. Due to Assumption A\T\strong duality holds and we have: 

d* = maxd(y) s,trong _i! uallty m in{<ft(Y) \ Ax = b } = <j>* . (7) 

y eV xex Lrw ' 

Let us denote by Y* the solution set of the dual problem l(6}. It is well-known that Y* is 
bounded due to Assumption AQ] 

Now, let us consider the dual function d defined by l|5]l. It is important to note that the 
dual function d(y) can be computed separately as: 

d(y)=d 1 (y)+d 2 (y)-b T y, (8) 

where 

d t (y) := min +y T A f x i } , i = 1,2. (9) 

We denote by x*(y) a solution of the minimization problem in (0 (i = 1,2) and x*(y) := 
WWj^W)' Since 0,- is continuous and X; is closed and bounded, this problem has a solu- 
tion. Note that if x* (y) is not uniques for a given y then di is not differentiable at the point 
y (i = 1,2). Consequently, d is not differentiable at y. The representation ®-(|9} is called a 
dual decomposition of the dual function d. 



2.2 Smoothing the dual function via prox-functions 

By assumption that X,- is bounded, instead of considering the nonsmooth function d, we 
smooth the dual function d by means of prox-functions. A function p; is called a proximity 
function (prox-function) of a given nonempty, closed and bounded convex set X,- C W* if Pi 
is continuous, strongly convex with convexity parameter ct,- > andX,- C dom(p,). 

Suppose that /?, is a prox-function of X,- and ct,- > is its convexity parameter (i = 1,2). 
Let us consider the following functions: 

di{y,Pi) :=nm{^(x i )+y T A i x i + Pip i (x i )}, i = 1,2, (10) 

d(y;p l ):=d l ( y ;p 1 )+d 2 (y;pi)-b T y. (11) 

Here, jSi > is a given parameter called smoothness parameter. We denote by x* (y; jSi ) the 
solution of fllOl l. i.e.: 

x*(y;j6i) := argmin +y T A i x i + Pip i {x i )}, i = 1,2. (12) 

Note that it is possible to use different parameters jSf for fllOt (i = 1 , 2). 
Let X; be the prox-center of X, which is defined as: 

x\ = argmin pi(xi), i = 1,2. (13) 
XjeXj 

Without loss of generality, we can assume that Pi(x?) = 0. Since X, is bounded, the quantity 

Dj := maxpi(xi) (14) 

is well-defined and < D, < +°° for i = 1,2. The following lemma shows the main proper- 
ties of d(-;j3i), whose proof can be found, e.g., in 127 113 II . 
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Lemma 1 For any /3i > 0, the function d\ ( ■ ; jSi ) defined by d 1 Ob is well-defined and contin- 
uously differentiable on W". Moreover, this function is concave and its gradient w.r.t y is 
given as: 

Vrf,(y;jSi)=A,x*(v; J S 1 ), i=l,2, (15) 

which is Lipschitz continuous with a Lipschitz constant L^(jSi) = ~j^§r = 1-2). The fol- 
lowing estimates hold: 

diiy-Pi) > di(y) > diiy^i) - PiDi, i=l,2. (16) 

Consequently, the function d(-; jSi) defined by dl lb ii concave and differentiable and its gra- 
dient is given by Vd(y; /3i ) := Ax* (y; jSi ) — which is Lipschitz continuous with a Lipschitz 
constant L ■ (pi) := jjj-L/=i g ■ Moreover, it holds that: 

d(y;Pi) > d(y) > d(y,pi)- fa{Di +D 2 ). (17) 

The inequalities J 17b show that d(-;f5i) is an approximation of d Moreover, d( ;jSi) con- 
verges to J as jSi tends to zero. 

Remark 1 Even without the assumption that X is bounded, if the solution set X* of l[2} is 
bounded then, in principle, we can bound the feasible set X by a large compact set which 
contains all the sampling points generated by the algorithms (see Section|4]below). However, 
in the following algorithms we do not use D/, i = 1,2 (defined by ( 114b ) in any computational 
step. They only appear in the theoretical complexity estimates. 

Next, for a given > 0, we define a mapping ^2) from X to K by: 

V/(x ;j 8 2 ):=max{(Ax-fc) r y-f ||>f ). (18) 

yeR'" I 2 I 



This function can be considered as an approximate version of \j/(x) := max {(Ax — b) T y\ 

yeK" 

using the prox-function p(y) := ^ ||y|| 2 - It is easy to show that the unique solution of the 
maximization problem in i ll 8t is given explicitly as y*(x;pi) = j^(Ax — b) and ^f(x;pi) = 
2^- ||Ax — b\\ 2 . Therefore, ^(sft) is well-defined and differentiable on X. Let 

/(*;&) := + yr(xih) = $(x) + ±-\\Ax-b\\ 2 . (19) 

zp 2 

The next lemma summarizes the properties of )|/(-;jS 2 ). 

Lemma 2 For any jS 2 > 0, the function \ff(-; jS 2 ) defined by dl8b is continuously differen- 
tiable on X and its gradient is given by: 

V ¥ {x;fc) = (V C1 v/(x ;j S 2 ), V. r2V /(x;ft)) = (A[y*(x;/3 2 ), A^y* (x;/3 2 )), (20) 

which is Lipschitz continuous with a Lipschitz constant ZT(/3 2 ) := jj^(||^l|| 2 + 11^2 1| 2 )- 
Moreover, the following estimate holds for all x,x 6 X: 

V/(x;/3 2 ) < ^;jS 2 ) + Viv/(i;jS 2 ) 7 '(x 1 -xi) + V2^;jS 2 ) r (x 2 -x 2 ) 

Lf(p2) s , l2l ^(fe) „ p||2 (21) 

+- L 2 — +^ — Ir>-*2|| , 



where L* (fa) := j|||Ai|| 2 andl^ifit) := £||A 2 | 



2 
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Proof Since )^(;t;ft) = j^\\A\x\ +A2X2 — b\\ 2 by the definition <Tl~8T> and y*(x: ft) = jr{A\x\ + 
A2X2 — b), it is easy to compute directly Vi|/(-;ft). Moreover, we have: 



V/(x; J S 2 )-V(-«;ft)-Vv/(i;/3 2 ) 7 '(-x-x) = r^-||Ai (*i -ft) +A 2 (x 2 -x 2 )f 

< -^WAifWxi -x 1 || 2 + -^-||A 2 || 2 ||x 2 -i 2 | 





This inequality is indeed J21t . 



□ 



From the definition of /(-;ft), we obtain: 



/(x;ft) - -i-||Ax- fo|| 2 = < /(x;ft). 



(23) 



Note that /(-;ft) is an upper bound of instead of a lower bound as in 1311 . Note that 
the Lipschitz constants in d21b are roughly estimated. These quantities can be quantified 
carefully by taking into account the problem structure to trade-off the computational effort 
in each component subproblem. 



2.3 Excessive gap technique 

Since the primal-dual gap of the primal and dual problems (|2j-(|6]l ls measured by g(x,y) := 
<j> (x) — d(y), if the gap g is equal to zero for some feasible point x and y then this point is an 
optimal solution of l|2]l-l|6]l. In this section, we apply to the Lagrangian dual decomposition 
framework a technique called excessive gap proposed by Nesterov in 11311 . 

Let us consider d(y;f5i) := d(y;pi) - p\(L>i +D 2 ). It follows from <[T7j and (|23j that 
d(-;f5i) is an underestimate of d(-), while /(-;ft) is an overestimate of $(■). Therefore, 
< g(x,y) = 0(jc) - d(y) < /(or, ft) - d(y;j5\) + ft(Di +D 2 ). Let us recall the following 
excessive gap condition introduced in [31 ]. 

Definition 1 We say that a point (x,y) 6 X x R'" satisfies the excessive gap condition with 
respect to two smoothness parameters ft > and ft > if: 



where /(-;ft) andrf(-;jSi) are defined by (123 ) and (lilt , respectively. 

The following lemma provides an upper bound estimate for the duality gap and the feasibil- 
ity gap of problem @. 

Lemma 3 Suppose that (x,y) 6 X x W satisfies the excessive gap condition \2A\ . Then for 
any y* 6 Y*, we have: 



f(x;p 2 )<d(y;P l ), 



(24) 



<(l>(x)-d(y)<p 1 (D l +D2)-^\\Ax-b\\ 2 <p 1 (D 1 +D 2 ), (25) 



and 



\\Ax-b\\ 




(26) 
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Proof Suppose that x and y satisfy condition d24b . For a given y* g Y* , one has: 
d(y) <d(y*)= mm {^>(x) + (Ax -b) T y*} < <j>(x) + (Ax-b) T y* 
<$(x) + \\Ax-b\\\\y*l 
which implies the first inequality of ( 125 b . By using Lemma[T]and l !19b we have: 

<t>(x) - d(y) 17 < 23 /(*;&) - d(y;fr ) + J3, (Z), + D 2 ) --^-||Ax- f>|| 2 . 

Zjt> 2 

Now, by substituting the condition l !24b into this inequality, we obtain the second inequality 
of dBJ. Let 77 := ||Ajc-&||. It follows from <(25j» that rj 2 - 2p 2 \\y* -2j6ijS 2 (/)i +D 2 ) < 0. 
The estimate 026b follows from this inequality after few simple calculations. □ 

3 New decomposition algorithm 

In this section, we derive an iterative decomposition algorithm for solving l|2} based on 
the excessive gap technique. This method is called a decomposition algorithm with primal 
update. The aim is to generate a point (x,y) 6 X x K m at each iteration such that this point 
maintains the excessive gap condition ( 124b while the algorithm drives the parameters jSi and 
P2 to zero. 

3.1 Proximal mappings 

As assumed earlier, the function (j), is convex but not necessarily differentiable. Therefore, 
we can not use the gradient information of these functions. We consider the following map- 
pings (i= 1,2): 

Pi{x;fr) := argmin (fcfe) +/(x;/3 2 ) r A,-(*« - *,■) + ^^||*/ -ii|| 2 ) , (27) 

where y* (x; jS 2 ) := -^(Ax — b). Since Lf(p2) defined inLemma[2]is positive, f}(-;/3 2 ) is well- 
defined. This mapping is called proximal operator (7). LetP(-;jS 2 ) = (Z ) i(-;j6 2 ),P 2 (-;j6 2 )). 

First, we state that the excessive gap condition d24b is well-defined by showing that there 
exists a point (x,y) that satisfies (124b . This point will be used as a starting point in Algorithm 
[T]described below. 

Lemma 4 Suppose that x c = (ij;^) is the prox-center ofX. For a given /3 2 > 0, let us 
define: 

y:=—{Ax c -b) and * := P(* c ; ft). (28) 
P2 

If the parameter /3i is chosen such that: 

J S 1J S 2 >2max(M£\, (29 ) 

1<!< 2 [ Oi J 

then (x, y) satisfies the excessive gap condition d24b . 
The proof of Lemma|4]can be found in the appendix. 
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3.2 Primal step 

Suppose that (x,y) 6 X x R m satisfies the excessive gap condition ( 124b . We generate a new 
point (x + ,y + ) elx R m and by applying the following update scheme: 



x:=(l-T)* + T**(y ;j 8i), 

y+:=(l- T )3; +T /(i; J S + ), (30) 

J3+:=(1-T)j3 1 andj3+ = (1-T)j3 2 , (31) 
where P(-;jS 2 + ) = (Pi(-;jS 2 + ),P 2 (-;jS 2 + )) and T e (0, 1) will be chosen appropriately. 

Remark! In the scheme ( 130b . the points x*(y\$\) = (xj (y; jSi ) , ^(y; jSi )), x = (Jci ,^2) and 
x + = {x~l ,x.2 ) can be computed in parallel. To compute x*{y;fi\) and x + we need to solve 
the corresponding convex programs in R" 1 and R" 2 , respectively. 



The following theorem shows that the update rule (130) maintains the excessive gap con- 
dition d24t . 

Theorem 1 Suppose that (x,y) 6 X x R m satisfies ( 124b vWfn respect to two values jSi > 
and j6 2 > 0. TTien generated by scheme OOb - Olb is in X x R'" and maintains the 

excessive gap condition ( 124b with respect to two smoothness parameter values j6j + and jS^ 
provided that: 

j3ifc>7T^max(Ml. (32) 



(1 - t) 2 i<;<2 [ a. 

Proof The last line of OOb shows that x + 6 X. Let us denote by y = y*(Jc;jS 2 f ). Then, by 
using the definition of d(-; /3i), the second line of OOb and /3[ + = (1 — t)/3i, we have: 

d(y+ ; J3+) = mm {0 (x) + (Ax - + P? [?i (*i ) + P2 (* 2 )] } 
^163 min { (jc) + ( 1 — t) (Ax - + t(Ax - bfy 

+ (1-t) j B 1 [ Pi (x 1 )+ P2 (x 2 )]} (33) 
= mm {(1 - T) [Q(x) + (Ax -b) T y + ft [pi (xi ) + f> 2 (x 2 )]] 

+ x[<l>{x) + (Ax-b) T y]} . 
Now, we estimate the first term in the last line of ( 133b . Since jS^ = (1 — T)p2, one has: 

VCSfc) = = (1 -t)-^||Ax-/7|| 2 = (1-t) ¥ (x;J3+). (34) 

2P2 2p 2 r 
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Moreover, if we denote by x 1 = x*(y;/3i) then, by the strong convexity of p\ and pi, d34t 
and fix; /3 2 ) < d(y; ft ), we have: 

Ti :=*(*) + (Ax - bfy + fr [pi (jcj ) + p 2 (x 2 )} 

>rmn{(j)(x) + {Ax-b) T y+p 1 \py(xy)+p2(x2)}} + ^i[ay\\x l -x\\\ 1 +a 2 \\x2-x\\\ T \ 

= d&fr) + ift [aj II*! -*{ || 2 + a 2 ||x 2 -x'U 2 ] 

(24) 1 ( 35) 

> /(x;j3 2 ) + ~p, [ CTl ||^-.,|||2 + a 2 ||x 2 -x 2 || 2 ] 

def - {k m ^(jc) + ft) + ift [ CTl ||x, -x| || 2 + a 2 ||x 2 -x 2 || 2 ] 

i* + v/(x ;j S+) + ift [ai||*, -x| || 2 + a 2 |jx 2 -x'|| 2 ] - tiKz;/3+) 

^ *(*) + v/(i;ft+) + Vv/(x ;j S+) r (x-x) + Ift ||x, -xj || 2 + a 2 ||x 2 -x 2 || 2 ] 

+-i T ||A(x-x)|| 2 -T V /(x; J B 2 + ). 
zp 2 

For the second term in the last line of d33t . we use the fact that y = 4+ (Ax — b) and 
Vy\jf(x~; P2) = A T y to obtain: 

T 2 :=<j)(x) + (Ax-b) T y 

= (j)(x) + y T A(x-x) + (Ax-b) T y 

, (36) 
def - U m (x) + V W (x;f$+f(x-x) + —\\Ax-b\\ 2 

P2 



def. \jf 



(j) (x) + v/(x; p+) + Vw(x; jS 2 + ) r (x - x) + \ff(x; J3+ 



Substituting < f35T > and l T3"6l > into l l33l and noting that ( 1 — t) (x — x) + t(x — x) = t(x — x 1 ) due 
to the first line of OOb . we obtain: 

d(y+;P+) =min{(l-T)J 1 +T7 2 } 

OU+03 r r 

> min-^ (1 - t) \<f>(x) + V/(x;/3 2 + ) + Vv/(x;jS 2 +) r (x-x) 

+ -/3l [CTlljx, -x||| 2 + CT 2 ||x 2 -X 2 || 2 ] 



+ v/(x;/3 + ) + V ¥ (x;p+) T (x-x)] } 



-t(1 - t)v/(x;/3+) + ^^||A(x--x)|| 2 + tv/(x;/3+) (37) 
zp 2 



= min|(l- t)<!>(x) +z<t>(x) + ^(x-^ + )+V^(x;^ + ) T [(1 - t)(x-x) + t(x-x)] 
+i(l-T)ft ||jci - jc} || a + - ^4 II*] } + T 3 

0— convex 

> min|0((l - t)x+tx) + v/(x;jS 2 + ) + TVi/^(x;jS 2 + ) r (x-x 1 ) 
+i(l-r)ft [^Ih-xlf + a.l^-x.H 2 ] }+T 3 , 
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where T 3 := %^||A(jE-x)|| 2 + TV/(i;jS 2 + ) - t(1 - t)v/(x;/3 2 + ). Next, we note that the con- 
dition 02b is equivalent to: 

2t 2 

(1-T)j3ig f > ||A / || 2 >Lf(/3+)T 2 , i=l,Z (38) 

(I-TJP2 

Moreover, if we denote by m := x + t(x — x) then: 

m — x = x + t(x — x) — x = x+ t(x — x) — (1 — i)x— Tx l = t(x — x l ). (39) 
Now, by using Lemma[2] the condition d38b and d39l l. the estimation d37l > becomes: 

<y + ;j3i + )-T 3 > min U(u) + v(i;/3 2 + ) + Vy(je*,ft) r («-*) 

u:=x+z(x—x)ex+t(X—x) y. 
61(1 — T)CTi . 2 /3i(l - T)q 2 M . ||2 \ 

I+T(X-x)CX 



> min 



{ v(£ J3 2 + ) + (?) («) + Vv/(.f ; j3 2 + ) r (« - *) (40) 
. a j3i(l-T 



+ ^2 Pi-^ill + 2T 2 ll M2_X2 ll ) 

ED r ^ 1 rp 

> rrrin|0(i/) + r)+Vy(i;j3 2 T) (w-Jc) 

f ||2, 4(fe + ) |, f|| 2l 
H ^ ||Ki— Xi|| H ||«2-JC2|| J 



2 2 

38 WO + v(-«;j6 2 + ) + Vv/(i;iS 2 + ) r (^ -i) 



+^f 1 H-^\\ 2 + ^f 1 \\4-M 2 

f«?(i + ) + V/(^; J S 2 + )=/(x + ; J S 2 + ). 

To complete the proof, we show that T3 > 0. Indeed, let us define u:=Ax — b and u :=Ax—b, 
then u — u = A (x — x) . We have: 

= ^+ [ T II"H 2 - T (! " T )II"H 2 + (1 - T )H" " "II 2 ] 
Z P 2 

= [ T II"H 2 " T ( ! " T )II M "H 2 + (! " T )I!"H 2 + (! " T )H M 1! 2 " 2 (! " ^"1 < 41 ) 

= [||«|| 2 + (l-T) 2 ||«|| 2 -2(l-T)« r «] 
zp 2 



1 



|w-(l-T)i7|| 2 >0. 



Substituting (EB into i@0]> we obtain the inequality d(y + ; > /(jc+; /3 2 + ). □ 
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Remark 3 If 0; is convex and differentiable and its gradient is Lipschitz continuous with a 
Lipschitz constant hf > for some i = 1,2, then instead of using the proximal mapping 
Pi{-\ j6 2 ) in BOb we can use the gradient mapping which is defined as: 

r V(B + ) -\ 

G ; (i;/3 2 + ):=argmin V^,(i«) 7 '(-«/--«/)+/(^ft) r A i (x,-i,) + ^f^ -*,-|| 2 , (42) 

where L^(jS 2 + ) := La + 2 -^f _ - Indeed, let us prove the condition fi?(y + ;jSj + ) > /(i+ijS^), 

where G(x; P2) '■= (Gi {x\ ; /3 2 ),G 2 (x 2 ; ft)) andl + := G^jjS^). First, by using the convexity 
of 0, and the Lipschitz continuity of its gradient, we have: 

U^)+ V Uxi) T (m-Xi) < $(«,•) < Ux^+VUxifiiii-Xi) + bk\\ Ui -x i \\ 2 . (43) 
Next, by summing up the second inequality from i = 1 to 2 and adding to d21b we have: 
<j>{u) + v(«;j3 2 + ) < <H*) + V&PZ) + [V0(*) + Vv/(i;jS 2 + )] r («- £) 

.TO),, mi^^ + )„ - l|2 (44) 

Finally, from the second inequality of d40t we have: 

^(y ;ft + )-T3 > mm|(?(») + v/(i;/3 2 + )+V^(i;jS 2 + ) r ( M -i) 

(1 - T)j3i<Ti ,. , |2 (!-T)j3iCJ 2 || 2 1 
H ^ ll M l- x l|l H ^ 1 1 i/2 — JC2 1 1 j 



2 T 2 ^" ' 2T 2 

convex+ 1441 r 

> min U ( jf ) + V<j> (x) T ( u - x) + \j/(3c; jS 2 + ) + V \jf(x; /3 2 + ) T (1 



+ 1 v 2 z / n«i -*iir+ z 2 z / 1[^2 - JC2I1 2 } 

(jc) + v(£ jS+) + [V0 (x) + Vy(*,P})] T {x + - x) 

(44} 

> «Kl+) + v/(i + ;j8 2 + ) =/(!+;£+). 

In this case, the conclusion of Theorem [TJ is still valid for the substitution x + := G(Jc;/3 2 + ) 
provided that: 

If Xi is polytopic then problem d42b becomes a convex quadratic programming problem. 

Now, let us show how to update the parameter X such that the condition d32l l holds for 
/3[ + and /3 2 + '. From the update rale (|3~H we have jS^jS^ = (1 — t) 2 /3ijS 2 . Suppose that ft and 
jS 2 satisfy the condition l !32b . i.e.: 

T 2 f IL4-I 

ft& > 7z ttjL, where L:=2max 

(1 - T) 2 1<K2 [ (7, 
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If we substitute /3i and fa by /3[ + and jS^, respectively, in this inequality then we have 
Pi ft > (T^F 1 - However ' since Pi ft = (1 " *) 2 Pi02. it implies frfe > (1 _ t) 4_ t+) ^ . 

2 x 2 

Therefore, if 7J3^5 > (i-t) 2 (i-t ) 2 t ' len anc ^ ^2 satisfy ( 132b . This condition leads to 
T > ^ . Since x. T + 6 (0, 1), the last inequality implies < T + < j and 

0<T+<— V — <1. (46) 

T+ 1 

Hence, 130H3U are well-defined. 

Now, we define a rale to update the step size parameter T. 

Lemma 5 Suppose that To is arbitrarily chosen in (0, j). Then the sequence {Zk}k>o gen- 
erated by: 

ijt + 1 



satisfies the following equality: 



T k = —^—, V£>0. (48) 

1 + To« 



Moreover, the sequence {Pk}k>0 generated by Pk+i = (1 — Tk)fik for fixed /3o > satisfie, 



At = — r~~T > VA:>0. (49) 
To*:+ 1 

Proof If we denote by r := ~ and consider the function £j(f) := ? + 1 then the sequence 
{tk}k>o generated by the rale tk+i := E, (tt) = h + 1 satisfies tk =to + k for all /: > 0. Hence 

T k = T t = l^k = for k ^ °- To P rove <E2), we observe that = AoIlioC 1 " T «')- 

Hence, by substituting ( 148b into the last equality and carrying out a simple calculations, we 
get @9j. □ 

Remark 4 Since To € (0,0.5), from Lemma|5]we see that with To — > 0.5~ (e.g., To = 0.499) 
the right-hand side estimate of i4% is minimized. 



3.3 The algorithm and its worst case complexity 

Before presenting the algorithm, we assume that the prox-center of X, is given a priori 
for (£ = 1,2). Moreover, the parameter sequence {t^} is updated by J47b . The algorithm is 
presented in detail as follows: 



ALGORITHM 1 (Decomposition Algorithm with Primal Update) 
Initialization: 

1. Set T := 0.499. Choose jSf > and /3 2 ° > as follows: 



ft u = j3 2 u := 1 /2max(M 

■ \<i<i y (t, 
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2. Compute jr and y from ( 128b as: 

f := -1 (A* c - 6) and x° := P(a c ; /3 2 °) , 
P2 

Iteration: For & = 0, 1, • • • do 

1 . If a given stopping criterion is satisfied then terminate. 

2. Update the smoothness parameter /3| +1 := (1 — T^jS*. 

3. Compute x^ +l in parallel for i = 1,2 and by the scheme 1301 : 

(/+>,/+>) :=<(^;/3f,# + U). 

4. Update the smoothness parameter: j3 k+1 := (1 — Tk)Pi- 

5. Update the step size parameter by: T^ + i := "^+y- 

End of For. 

As mentioned in Remark[2] there are two steps of the scheme stfZ at Step 3 of Algorithm[T] 
that can be parallelized. The first step is finding x* (y k ; /3i ) and the second one is computing 
x k+l . In general, both steps require solving two convex programming problems in parallel. 
The stopping criterion of Algorithm [TJat Step 1 will be discussed in Section[6] 

The following theorem provides the worst-case complexity estimate for Algorithm[TJ 

Theorem 2 Let {(f,y k )} be a sequence generated by Algorithm^]] Then the following du- 
ality gap and feasibility gap hold: 



and 



^)-d(f)<^[T D2) > (50) 
rv ; J ' - 0.49%+ 1 

\\A^-b\\ < S ' L 



0.499/t+l 



y*|| 2 + 2(D 1 +D 2 ) 



(51) 



IK' 2 



where L:=2 max < - — S and y* G Y* 

l</<2 [a; 1 

Proof By the choice of = /3 2 = 

and Steps 1 in the initialization phase of Algorithm 
[T] we see that j3 k = /3| for all k > 0. Moreover, since To = 0.499, by Lemma [5J we have 

Pi = P\ = -^fi = 0.499*+ 1 • Now ' b y aPPiying Lemma[3]with J3] and /3 2 equal to jSf and $ 
respectively, we obtain the estimates ( 150b and l !51b . □ 

Remark 5 The worst case complexity of Algorithm [T] is However, the constants in 

the estimations d50b and ( 151b also depend on the choices of jSf and j6 2 , which satisfy the 
condition ( 129b . The values of /3f and /3 2 will affect the accuracy of the duality and feasibility 
gaps. 

In Algorithm [T] we can use a simple update rule = where a > is arbitrarily chosen 
such that the condition T^ + i < holds. However, the rule I l47b is the tightest one. 
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4 Switching decomposition algorithm 

In this section, we apply the switching strategy to obtain a new variant of the first algorithm 
proposed in 1311 Algorithm 1] for solving problem (|2}- This scheme alternately switches 
between the primal and dual step depending on the iteration counter k being even or odd. 
Apart from its application to Lagrangian dual decomposition, this variant is still different 
from the one in 1311 at two points. First, since we assume that the objective function is 
not necessarily smooth, instead of using the gradient mapping in the primal scheme, we 
use the proximal mapping defined by ( 127b to construct the primal step. In contrast, since 
the objective function in the dual scheme is Lipschitz continuously differentiable, we can 
directly use the gradient mapping to compute y + (see d55b). Second, we use the exact update 
rule for T instead of the simplified one as in 1311 . 



4. 1 The gradient mapping of the smoothed dual function 

Since the smoothed dual function d(-;ft) is Lipschitz continuously differentiable on R m 
(see LemmaQ}. We define the following mapping: 

G(y;ft) := argmax \vd(y-fr) T {y-y) - ^M\\ y -yf\ , (52) 

where L d (ft ) : = h\ (ft ) +L( (ft ) = ^ + and V d(y; ft ) = A v x\ (y ; ft ) + A 2 x* 2 (y; ft ) - 
b. This problem can explicitly be solved to get the unique solution: 

G(S;pi) = 1 ^[Ax*(S;p l )-b]+S. (53) 

The mapping G(-;ft) is called gradient mapping of the function £/(-;ft) (see [29]). 



4.2 A decomposition scheme with primal-dual update 

First, we adapt the scheme OOb-Olb in the framework of primal and dual variant. Suppose 
that the pair (x,y) 6 X x R m satisfies the excessive gap condition ( 124b . The primal step is 
computed as follows: 

ri:=(l-T)x + Tx*(y;ft), 
(x+,t):=^(x,y;Pup2,*) <=► \y + ■= (1 - x)y + Ty* (54) 

and then we update jS[ + := (1 — T)ft, where T € (0, 1) and /"(sft) is defined in ( 127b . The 
difference between schemes and sf p is that the parameter ft is fixed in stf p . 
Symmetrically, the dual step is computed as: 

f y:=(l-T)y + T/(x;ft), 
(jt f ,j^):=^(jc,y;ft,ft,T)<=>^ x+ := (l- T )x + «*(fcft), (55) 

( y+:=G(.y;ft), 
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where T 6 (0, 1). The parameter /3i is kept unchanged, while fa is updated by jS^ := (1 — 

The following result shows that (jt + ,y + ) generated either by .a/^ or by srf maintains 
the excessive gap condition I 



Lemma 6 Suppose that (x,y) G X X R m satisfy @D with respect to two values jSi and fa. 
Then (x + ,y + ) generated either by scheme gtf p or by stf d is in X x K m and maintains the 
excessive gap condition d24b with respect to either two new values /3j + and fa or fa and faf 
provided that the following condition holds: 

fafa>^L max ilMl\. (56) 



1 — T l</<2 [ O", 

The proof of this lemma is quite similar to [31 Theorem 4.2.] that we omit here 



Remark 6 Given jSi > 0, we can choose fa > such that the condition l !29l l holds. Let 

y c : = 6 K m , we compute a point (5r,yP) as: 

jf> :=x*(f;fa) and f :=G(f;fa) = —L-(Ax-c)+f. (57) 

L d(fa) 

Then, similar to d28b . the point (Jc°,y°) satisfies d24b . Therefore, we can use this point as a 
starting point for Algorithm [2]below. 

In Algorithm|2]below we apply either the primal scheme or the dual scheme srf d by 
using the following rale: 

Rule A. If the iteration counter k is even then apply g/ p . Otherwise, stf d is used. 

Now, we provide an update rule to generate a sequence {t^} such that the condition ( 156b 
NUj|| 2 1 

holds. Let L := 2 max < - — — > . Suppose that at the iteration k the condition l !56b holds, 
i</<2 |_ o~, J 

i.e.: 

P1P2 > r^r L ™ 

1 - T k 

Since at the iteration k+1, we either update /3f or /3|. Thus we have jSf+'jS^ 1 = (1 - 
TjQjSfjS*. However, as the condition ( |58] | holds, we have (1 — T^jSfjSl > T?L. Now, we 
suppose that the condition d56b is satisfied with f} k+1 and J3* > i.e.: 

T 2 



pk+itf+i > ^±L_I. (59) 
1 - 

This condition holds if T^L > -j-^-j-L, which leads to t| +1 +T^T^_)-i — i\ < 0. Since Tj, T^+i 6 
(0, 1), we obtain: 

<%■ (60) 



ft 



+4-T t 



The tightest rule for updating is: 



(61) 
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for all k > and To 6 (0, 1) given. Associated with {Tk}, we generate two sequences {jS*} 
and {jS|} as: 

A+1;= f(l-T t )ft* if *is even ^ , +1 , = f/?f if/ciseven (g2) 

1/3* otherwise, " 1 (1 — T^)/^ otherwise, 

where /3} 1 = /3 2 ° = j3 > are fixed. 

Lemma 7 Let {t^}, {/3*} {/3|} be three sequences generated by l !61b awrf l !62b . respec- 
tively. Then: 



1 " ^ < ft4 < 2/3 vT^ ; ^ ME^ < ft * < 4, (63 ) 



2T fc+l 1 T £ 2T Jfc+ 1 T £ 

/or a// & > 1. 

The proof of this lemma can be found in the appendix. 

Remark 7 We can see that the right-hand side X\ki^o) := ^(k+rl) °^ ^3b is decreasing in 
(0, 1) for k > 1. Therefore, we can choose To as large as possible to minimize r\k{-) in (0, 1). 
For instance, we can choose To := 0.998 in Algorithmic 

Note that Lemma[7]shows that Tj ~ O(j)- Hence, in Algorithm[2] we can also use a simple 
updating rale for T* as Tj = -p^, where a 6 (|,2) and Z? > > 0. This update satisfies 



4.3 The algorithm and its worst-case complexity 



Suppose that the initial point (x°,y°) is computed by ( 157b . Then, we can choose jSf = jS, = 
f ll^'ll 2 1 

2 max < — - — > which satisfy ( |29b . The algorithm is now presented in detail as follows: 



i<;<2 I a, 



ALGORITHM 2 (Decomposition Algorithm with Primal-Dual Update) 



Initialization: 



1. Choose T := 0.998 and set jSf = J3£ := J 2max 1 < i < 2 {^}. 

2. Compute j£° and y° as: 

*° :=**(/;#>), andy := -L^(Ax°-b)+f 

Iteration: For k = 0, 1, • • ■ do 

1. If a given stopping criterion is satisfied then terminate. 

2. If k is even then: 

2a) Compute (x k+ \f +1 ) as: 
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2b) Update the smoothness parameter jSf' as /3* +1 := (1 — Tjt)j3*. 

3. Otherwise, i.e. if k is odd then: 

3a) Compute as: 

(^+i,/+i):=^(^,/ ;j Sf, J S|, T ,). 

3b) Update the smoothness parameter jS| as jSj +1 := (1 — ^k)p2- 

4. Update the step size parameter as: T^+i := y [^/ + Tjtj . 

End of For. 



The main steps of Algorithm [2] are Steps 2a and 2b, which requires us to compute either 
a primal step or a dual step. In the primal step, we need to solve two convex problem pairs 
in parallel, while in the dual step, it only requires to solve two convex problems in parallel. 
The following theorem shows the convergence of this algorithm. 

Theorem 3 Let the sequence {(i^, y k )}k>0 be generated by Algorithm^ Then the duality 
and feasibUity gaps satisfy: 



and 

\\A#-b\ 



0.998£ 



y*\\ + J\\y*\\ 2 + 2(D l+ D 2 ) 



(65) 



where L:=2 max i — /■ and k > 1. 



l</<2 [ (7, 

Proof The conclusion of this theorem follows directly from Lemmas [3] and [5] the condition 
T = 0.998, P° = J3 2 ° = Vl and the fact that jSf < J3|. □ 

Remark 8 Note that the worst-case complexity of Algorithm [2] is still O(^). The constants 
in the complexity estimates d501 > and l !51b are similar to the one in i 64b and fl65l l. respectively. 
As we discuss in Section |6] below, the rate of decrease of in Algorithmic] is smaller than 
two times of Tk in Algorithm [T] Consequently, the sequences {jS*} and {jS|} generated by 
Algorithm [T] approach zero faster than the ones generated by Algorithm|2] 

Remark 9 Note that the role of the schemes ,b^ p and ,B^ d in Algorithm[2]can be exchanged. 
Therefore, Algorithm [2] can be modified at three steps to obtain a symmetric variant as 
follows: 

1. At Step 2 of the initialization phase, d28b to compute x° and y° instead of d57b . 

2. At Steps 2a, stf p is used if the iteration counter k is odd. Otherwise, we use srf d at Step 
3a. 

3. At Steps 2b, /3| is updated if k is odd. Otherwise, jS* is updated at Step 3b. 



5 Application to strongly convex programming problems 

If 0; (i = 1 , 2) in (O is strongly convex then the convergence rate of the dual scheme i55 b 
can be accelerated up to 0{-k). 
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Suppose that 0/ is strongly convex with a convexity parameters ct, > (/ = 1,2). Then the 
function d defined by (|5} is well-defined, concave and differentiable. Moreover, its gradient 
is given by: 

Vd(y)=A lX Uy)+A 2 4(y)-b, (66) 

which is Lipschitz continuous with a Lipschitz constant := + . The excessive 
gap condition 024b in this case becomes: 

f{x;fa)<d(y), (67) 

for given x E X, y € W" and JS2 > 0. From Lemma [3] we conclude that if the point (x,y) 
satisfies 067b then, for a given y* 6 F*, the following estimates hold: 

-2p2\\y*\\ 2 <-\\y*\\\\Ax-b\\ < $(x) -d(y) <0, (68) 

and 

\\Ax-b\\ < 2ft 11/ 1|. (69) 

We now adapt the dual scheme J55b to this special case. Suppose (x,y) £ X x W n satisfies 
067b . we generate a new pair (x + ,y + ) as 



(x+,y+):=^/(x,y ;j S 2 ,T) 



y-= (l-z)y + zy*(x;p2), 
x+ := (\-T)x + Tx*(y), (70) 



where y(jc;ft) = -^(Ax — b), andx*(y) := (x\(y),X2(y)) is the solution of the minimiza- 
tion problem in {5). The parameter ft is updated by ft. + := (1 — T)ft and T 6 (0, 1) will 
appropriately be chosen. 

The following lemma shows that (x + ,y + ) generated by 070b satisfies 067b whose proof 
can be found in (3D . 

Lemma 8 Suppose that the point (x,y) £ X x W satisfies the excessive gap condition 067 b 
with the value ft. Then the new point (x + ,y + ) computed by 070b is in X x W" and also 
satisfies 067b with a new parameter value ft^ provided that 

ft>7^. (71) 

Now, let us derive the rule to update the parameter T. Suppose that ft satisfies 071b . Since 

ft 1- = (1 — T)ft, the condition 071b holds for ft+ if T 2 > . Therefore, similar to Algo- 
rithm^ we update the parameter T by using the rule 047b . The conclusion of Lemma[7]still 
holds for this case. 

Before presenting the algorithm, it is necessary to find a starting point (it ,.)' ) which 
satisfies {57). Let y c = 6 W and ft = . We compute y°) as 

x° := x*(f) and j7° := ±(Ax° - b) +f. (72) 

It follows from Lemma 7.4 [31] that (jc ,^) satisfies the excessive gap condition 067b . 

Finally, the decomposition algorithm for solving the strongly convex programming prob- 
lem of the form {2) is described in detail as follows: 
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ALGORITHM 3 (Decomposition algorithm for strongly convex objective function) 
Initialization: 

1. Choose T := 0.5. Set /3 2 ° = + ^3"- 

2. Compute jc° and y° as: 

x° := x*(f) and f := -L(Ax° - b) +f. 
Iteration: For k = 0, 1, • • • do 

1 . If a given stopping criterion is satisfied then terminate. 

2. Compute (x k+l ,y k+l ) using scheme ( 170b : 

3. Update the smoothness parameter as: jSj +1 := (1 — T,t)jS|. 

4. Update the step size parameter as: T^+i := y fy/ + 4 — T,tj . 

End of For. 

The convergence and the worst-case complexity of Algorithm [3] are stated as in Theorem|4] 
below. 

Theorem 4 Let {(x^ ,y k )}k>o be a sequence generated by Algorithm\3\ Then the following 
duality and feasibility gaps are satisfied: 

-%^f<H^)-d(y k )<0, (73) 

whereL o := \hit + i^t_ 



Proof From the update rule of % , we have (1 — T^+i ) = Moreover, since (k + = (1 — 
T k )f}i, it implies that = jS^IlioC 1 ~ x i) = ^T ' tf- By using the inequalities (80]l 

and ft = La, we have p k+l < 4 ^['~^ . With T = 0.5, one has J3| < By substituting 

this inequality into ( 168b and l |69b , we obtain ( 173b and l !74b . respectively. □ 

Theorem [4] shows that the worst-case complexity of Algorithm [3] is 0(-^). Moreover, at 
each iteration of this algorithm, only two convex problems need to be solved in parallel. 



6 Discussion on implementation and comparison 

6.1 The choice of prox-functions and the Bregman distance 

Algorithms[T]and|2]require to build a prox-function for each feasible set X; for i = 1,2. For a 
nonempty, closed and bounded convex setX,, the simplest prox-function is ptixi) := y ||jtj — 
x;|| 2 , for a given x-, 6 X, and p,- > 0. This function is strongly convex with the parameter 
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<7, = pi and the prox-center is Jc,, (i = 1,2). In implementation, it is worth to investigate the 
structure of the feasible set X, in order to choose an appropriate prox-function and its scaling 
factor pi for each feasible subset Xi (i = 1,2). 

In I27K we have used the Euclidean distance to construct the proximal terms. It is pos- 
sible to use a generalized Bregman distance in these problems which is compatible to the 
prox-function /?, and the feasible subset X; (i= 1,2). Moreover, a proper choice of the norms 
in the implementation may lead to a better performance of the algorithms, see (31 1 for more 
details. 



6.2 Extension to a multi-component separable objective function 

The algorithms developed in the previous sections can be directly applied to solve problem 
HJ in the case M > 2. First, we provide the following formulas to compute the parameters 
of Algorithms QJH 

f \\ A i II 2 

1. The constant L in Theorems[2]and[3]is replaced by Lm = M max < - — — 

\<km { a, 

2. The initial values of f}° and jS 2 ° in Algorithms |2] and are jSf = J3| = V^m- 

3. The Lipschitz constant Lj \p\) in Lemma[2]is Lf(j3 2 ) = ^f- (i = 1, . . . ,M). 

M 11^.112 



4. The Lipschitz constant Lrf(jSi) in Lemma[T]is := — ~ 

M 

5. The Lipschitz constant in Algorithm[3]is L 1 ^ := ^ 



M \\M\ 2 



Note that these constants depend linearly on M and the structure of matrix A; (i= 1, . . . ,M). 

Next, we rewrite the smoothed dual function d(y; f3\ ) defined by i ll II I for the case M > 2 
as follows: 

M 

where M function values dt(y; jSi) can be computed in parallel as: 

<k(y;Pi) = --^-bfy +min +/A,x,+jSip/(j:,)} . 

Note that the term — jjbfy is also computed locally for each component subproblem instead 
of computing separately as in (lilt . The quantities y and y + := G(y; /3i ) defined in d54l i and 
i55l can respectively be expressed as: 



M 



1 , 1 



f (1 _ X )y + (1 _ T ) £ _ (A^ - , 



i/ 



andy + :=j>+£ 



These formulas show that each component of y and y + can be computed by only using the 
local information and its neighborhood information. Therefore, both algorithms are highly 
distributed. 
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Finally, we note that if there exists a component 0, of the objective function (j) which is 
Lipschitz continuously differentiable then the gradient projection mapping G;(jc; ft) defined 
by d42b corresponding to the primal convex subproblem of this component can be used 
instead of the proximity mapping P;(jc; ft) defined by ( 127b . This modification can reduce the 
computational cost of the algorithms. Note that the sequence {Tyt}^>o generated by the rale 
( 147b still maintains the condition ( 145b in Remark[3] 



6.3 Stopping criterion 

In practice, we do not often encounter a problem which reaches the worst-case complexity 
bound. Therefore, it is necessary to provide a stopping criterion for the implementation of 
Algorithms[T]|2]and|3]to terminate earlier than using the worst-case bound. In principle, we 
can use the KKT condition to terminate the algorithms. However, evaluating the global KKT 
tolerance in a distributed manner is impractical. 

From Theorems [2] and [3] we see that the upper bound of the duality and feasibility gaps 
do not only depend on the iteration counter k but also on the constants L, D; and y* 6 Y*. 
The constant L can be explicitly computed based on matrix A and the choice of the prox- 
functions. We now discuss on the evaluations of Z), and y* in the case Xj is unbounded. Let 
sequence {(x k ,y k )} be generated by Algorithm [T] (or Algorithm [2}- Suppose that {{x k ,y k )} 
converges to (x*,y*) 6 X* x Y* . Thus, for k sufficiently large, the sequence {{x^ ,y k )} is 
contained in a neighborhood of X* x Y* . Given CO > 0, let us define 

£>■ := max pAxi) + co and y k := max lly'll +co. (75) 

' 0<j<k ' 0</<*" " 

We can use these constants to construct a stopping criterion in Algorithms [T] and [2] More 
precisely, for a given tolerance e > 0, we compute 



e d :=^(D k i+Di), and e/ ,:=ft* 



f + J(j k ) 2 + 2{D k +D k 2 ) 



(76) 



at each iteration. We terminate Algorithm [T]if < e and e p < e. A similar strategy can also 
be applied to Algorithms [2] and [3] 



6.4 Comparison. 

Firstly, we compare Algorithms [T] and [2] From Lemma|3]and the proof of Theorems [2] and 
[3] we see that the rate of convergence of both algorithms is as same as of J3* and J3*. At 
each iteration, Algorithm Q] updates simultaneously jSf and jS| by using the same value of 
%k, while Algorithm[2]updates only one parameter. Therefore, to update both parameters jS* 
and j6|, Algorithm[2]needs two iterations. We analyze the update rale of in Algorithms[T] 
and [2] to compare the rate of convergence of both algorithms. 

Let us define 

llW := and &( T ) := \ t] . 

The function to can be rewritten as Bj(t) = 1 . Therefore, we can easily show 

that: 

&(T)<&(T)<2&(T). 
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If we denote by {T^'}jt>o and {rf 2 }k>o the two sequences generated by Algorithms [T] and 
[2] respectively then we have T^' < T^ 2 < 2t^' for all k provided that 2Tq 1 > T^ 2 . Since 
Algorithm [T] updates jS* and /3| simultaneously while Algorithm [2]updates each of them at 
each iteration. If we choose Tjf 1 = 0.499 and T^ 2 = 0.998 in Algorithms Q] and |2] respec- 
tively, then, by directly computing the value of T^ 1 and T^ 2 , we can see that 2t^' > 2t^ 2 
for all k > 1. Consequently, the sequences {jSf } and {jS|} in Algorithm Q] converge to zero 
faster than in Algorithmic In other words, Algorithm [T]is faster than Algorithm [2] 

Now, we compare Algorithm[T] Algorithm [2] and Algorithm 3.2. in 1271 (see also |38 1). 
Note that the smoothness parameter /3i which is also denoted by c is fixed in Algorithm 3.2 
of 1271 . Moreover, this parameter is proportional to the given desired accuracy e, which is 
often very small. Thus, the Lipschitz constant L d ) is very large. Consequently, Algorithm 
3.2. of 1271 makes a slow progress at the very early iterations. In Algorithms Q] and [2] the 
parameters /3i and fa are dynamically updated starting from given values. Besides, the cost 
per iteration of Algorithm 3.2 11271 is more expensive than Algorithms [T] and [2] since it 
requires to solve two convex problem pairs in parallel and two dual steps. 



7 Numerical Tests 

In this section, we verify the performance of the proposed algorithms by applying them to 
solve the following separable convex optimization problem: 

M 

min \<j>(x) := £^(x,)}, 

x={x 1 ,....x M ) <- j_J > 

£ W M (7?) 
S.t. < (=)b, 

< Xj < Uj, i = 0, . . . ,M, 

where 0, : R" r — > R is convex, b, l, and «, e W z are given for i = 1 , . . . , M. The problem l !77t 
arises in many applications including resource allocation problems 1 19] and DSL dynamic 
spectrum management problems 1381 . In the case of inequality coupling constraints, we 
can bring the problem d77t in to the form of ([T) by adding a slack variable xm+i as a new 
component. 



7.1 Implementation details 

We implement Algorithms [T]and|2]proposed in the previous sections to solve d77b . The im- 
plementation is carried out in C++ running on a 16 cores workstation Intel®Xeron 2.7GHz 
and 12 GB of RAM. To solve general convex programming subproblems, we implement a 
primal-dual predictor-corrector interior point method. All the algorithms are parallelized by 
using OpenMP. 

The prox-functions := f — x?|| 2 are used, where is the center of the boxXj := 
[/,-, Ui] and p := 1 for all i = 1, . . . ,M. We terminate Algorithms [T]and|2]if rpf gap := ||A#* — 
bh/Ph < % and either rdfgap := max {o,j3f ££j D Xi - ^\\Ax* -b\\ 2 } < e d (|«?(^)| + 
1) or the value of the objective function does not significantly change in 3 successive it- 
erations, i.e. \(j)(x k ) — I /max{ 1.0, \(j)(x k )\} < e$ for j = 1,2,3, where e p = 10~ 2 , 
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£d = 10 _1 and £0 = 10~ 5 are given tolerances. Note that the quantity rdfgap is computed 
in the worst-case complexity, see Lemma [3] 

To compare the performance of the algorithms, we also implement the proximal-center- 
based decomposition algorithm proposed in 1271 Algorithm 3.2.] and an exact variant of the 
proximal-based decomposition in (JJ Algorithm I] for solving fl77b which we name PCBD 
and EPBD, respectively. The prox-function of the dual problem is chosen as dy(y) := | ||v|| 2 
with p := 1.0 and the smoothness parameter c of PCBD is set to c := v ji/ p n , where Dx is 

defined by d 14b - We terminate PCBD if the relative feasibility gap rpf gap < e p and either 
the objective value reaches the one reported by Algorithm [TJ or the maximum number of 
iterations maxiter = 10,000 is reached. 



7.2 Numerical results and comparison 

We test the above algorithms for three examples. The two first examples are resource alloca- 
tion problems and the last one is a DSL dynamic spectrum management problem. The first 
example was considered in 1201 . while the problem formulation and the data of the third 
example are obtained from [38 ]. 

7.2.7. Resource allocation problems. Let us consider a resource allocation problem in the 
form of ( 177b where the coupling constraint Y<i=i x i = bis tackled. 

(a) Nonsmooth convex optimization problems. In the first numerical example, we choose 
n x = 1, M = 5, the objective function 0,-(jc,-) := i\xi — i\ which is nonsmooth and b = 10 as in 
|20|. The lower bound is set to = — 5 and the upper bound Ui is K; = 7 for i = 1,...,M. 
With these choices, the optimal solution of this problem is x* = (—4,2,3,4,5). 

We use four different algorithms which consist of Algorithm [TJ Algorithm [2] PCBD in 
1271 and PCBD in (JJ Algorithm I] to solved problem I l77t . The approximate solutions re- 
ported by these algorithms after 100 iterations are je* = (-3.978,2,3,4,5), (-3.875, 1.983, 
2.990,3.996,5), (-4.055,2,3,4,5) and (-4.423,2,3,4,5), respectively. The correspond- 
ing objective values are (j)(x k ) = 4.978, 4.954, 5.055 and 5.423, respectively. 

The convergence behaviour of four algorithms is shown in Figure [TJ where the relative 
error of the objective function re$ := \<t>(x k ) — <j>*\/\<j>*\ is plotted on the left and the relative 
error of the solution re v := ||jt* — jc*||/||jc*|| is on the right. As we can see from these figures 
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Fig. 1 The relative error of the approximations to the optimal value (left) and to the optimal solution (right). 

that the relative errors in Algorithm |2] PCBD and EPBD oscillate with respect to the iteration 
counter while they are decreasing monotonously in Algorithm [T] The relative errors in Al- 
gorithms [TJ and [2] are approaching zero earlier than the ones in PCBD and EPBD. Note that in 
this example a nonmonotone variant of the PCBD algorithm 12711381 is used. 
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(b) Nonlinear resource allocation problems. In order to compare the efficiency of Algorithm 
[T] Algorithm[2]and PCBD, we build two performance profiles of these algorithms in terms of 
total iterations and total computational time. 

In this case, the objective function (j), is chosen as <j>i(xi) = afxj — w,ln(l + bfxi), where 
the linear cost vector a,-, vector b { and the weighting vector w, are generated randomly in 
the intervals [0,5], [0, 10] and [0,5], respectively. The lower bound and the upper bound are 
set to = (0, . . . , 0) T and m; = ( 1 , . . . , 1 ) T , respectively. Note that the objective function 0,- 
is linear if w, = and strictly convex if W; > 0. 

We carry out three algorithms for solving a collection of 50 random test problems with 
the size varying from M = 10 to M = 5,000 components, m = 5 to 300 coupling constraints 
and n = 50 to 500,000 variables. The performance profiles are plotted in Figure [2] which 
include the total number of iterations (left) and total computational time (right). The nu- 
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Fig. 2 Performance profile of three algorithms in log, scale: Left-Number of iterations, Right-CPU time. 

merical test on this collection of problems shows that Algorithm [T] solves all the problems 
and Algorithm [2] solve 48/50 problems, i.e. 96% of the collection. PCBD only solves 31/50 
problems, i.e. 62% of the collection. However, Algorithms [T] is the most efficient. It solves 
up to more than 81% problems with the best performance. PCBD is rather slow and exceeds 
the maximum number of iterations in many of the test problems (19 problems). Moreover, 
it is rather sensitive to the smoothness parameter. 

7.2.2. DSL dynamic spectrum management problem. In this example, we apply the proposed 
algorithms to solve a separable convex programming problem arising in DSL dynamic spec- 
trum management. This problem is a convex relaxation of the original DSL dynamic spec- 
trum management formulation considered in 11381 . 

Since the formulation given in [38] has an inequality coupling constraint Y4L1 x i < b, by 
adding a new slack variable xm+\ such that Y^=i x i = b and < xm+i < b, we can transform 
this problem into lO. The objective function of the resulting problem becomes: 







if i 
if i 



■-M+1 



M. 



(78) 



Here, a t 6 W, c;, g; 6 and H t := (hj k ) e R+ x "', (i = 1, . . .,M). The function ^ is convex 
(but not strongly convex) for all i = 1,...,M+ 1. As described in [38] that the variable jc, 
is referred to as transmit power spectral density, = N for all i = 1, ... ,M is the number 
of users, M is the number of frequency tones which is usually large and (j), is a convex 
approximation of a desired BER functiorQ the coding gain and noise margin. A detail model 
and parameter descriptions of this problem can be found in [38 1. 



Bit Error Rate function 
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We test three algorithms for the case of M = 224 tones and N = 7 users. The other 
parameters are selected as in 11381 . Algorithm [T]requires 922 iterations, Algorithm [2] needs 
1314 iterations, while PCBD reaches the maximum number of iterations k m . dx = 3000. The 
relative feasibility gaps ||Ax* — fe||/||fo|| reported by the three algorithms are 9.955 x 10~ 4 , 
9.998 x 10~ 4 and 2.431 x 10 -2 , respectively. The obtained approximate solutions of three al- 
gorithms and the optimal solution are plotted in Figure [3]which represent the transmit power 
with respect to the frequency tones. The relative errors of the approximation xr to the op- 
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approximate solutions of the DSL-dynamic spectrum management problem 1771 reported by three 
and the optimal solution. 



timal solution x* , err t := ||jc* -x* \\/\\x* ||, are 0.00853, 0.00528 and 0.03264, respectively. 
The corresponding objective values are 13264.68530, 13259.67633 and 13405.79722, re- 
spectively, while the optimal value is 13267.11919. 

Figure [3] shows that the solutions reported by three algorithms are consistently close to 
the optimal one. As claimed in [38 ], PCBD works much better than subgradient methods. 
However, we can see from this application that Algorithms [T] and [2] require fewer iterations 
than PCBD to reach a relatively similar approximate solution. 



8 Conclusions 

In this paper, two new algorithms for large scale separable convex optimization have been 
proposed. Their convergence has been proved and complexity bound has been given. The 
main advantage of these algorithms is their ability to dynamically update the smoothness pa- 
rameters. This allows the algorithms to control the step-size of the search direction at each 
iteration. Consequently, they generate a larger step at the first iterations instead of remain- 
ing fixed for all iterations as in the algorithm proposed in II27I . The convergence behavior 
and the performance of these algorithms have been illustrated through numerical examples. 
Although the global convergence rate is still sub-linear, the computational results are re- 
markable, especially when the number of variables as well as the number of nodes increase. 
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From a theoretical point of view, the algorithms possess a good performance behavior, due 
to their numerical robustness and reliability. Currently, the numerical results are still prelim- 
inary, however we believe that the theory presented in this paper is useful and may provide 
guidance for practitioners. Moreover, the steps of the algorithms are rather simple so they 
can easily be implemented in practice. Future research directions include the dual update 
scheme and extensions of the algorithms to inexact variants as well as applications. 
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A. The proofs of Technical Lemmas 

This appendix provides the proofs of two technical lemmas stated in the previous sections. 
A.l. The proof of Lemma|4] The proof of this lemma is very similar to Lemma 3 in 1311 . 
Proof Let y :=y*(i;ft) := jj-{Ax-b). Then it follows from (2T) that: 

Y(,x; ft ) < V(x; ft) + V, ft ) r (*i - M ) + V 2 v(t ft ) r (*2 - * 2 ) 

(79) 

def.Kr(.;&) 1 | U - i 1 1 2 | ~T a i .,,J,, 5 -,, L l(ft>ll - ||2 , ^(ft) n .,,2 

= Tr7rW A x- b \\ +}' M{xi-xi)+y A 2 {x 2 -X2)^ — — — H — — 1|*2-*2|| ■ 
2f<2 2 2 

= f(Ax-b)-^\\Ax-bf + ^\\x l -x l f + ^\\x 2 -x 2 \\ 2 - 

By using the expression f(x;j} 2 ) = <j>(x) + y/(jc;ft), the definition of x, the condition )29t and )79t we have: 

f(?,k) < Hx)+y T Mxi-4)+fMx2-x c 2 ) 

1 mm{(j>(x) + -^-\\Ax c - bf +fAi (*, -4)+fA 2 (x2 ~4) 
xex y. pi 

, ^f(ft) n Cl ,2 , 4(ft) „ c,|2l 1 „, ' hlfl 

+ — 2 — \\xi-x x \\ +^ — \\xi-x 2 \\ -b\\ 



^(ft)„. -„! , *7(fc),. rn 2 \ 1 „„, .,.2 



:rmn^W+y' (Ax-b) + ^^\\x l + -^f^\\x 2 -x^f } -—\\Ax<-b\ 



mm 



{^x)+f(Ax-b) + p l [p l (x i ) + p 2 (x 2 )]}-—\\Ax c -b\\ 2 



5 |n ' ' J v ' ' r^*" 1 ' • 2ft 1 

= d(y;p l )--L\\AS-b\\ 2 
2f>2 

which is indeed the condition 1241 . □ 



2S 



A.2. The proof of Lemma|7] 

Proof Let us define £(t) := — ,===—. It is easy to show that t, is increasing in (0,1). Moreover, Ti + i = 

% (t^) for all k > 0. Let us introduce m := 2/t. Then, we can show that < < STT- By usm g this 
inequalities and the increase of t, in (0, 1), we have: 

T ° - 2 <*<-^T-^T. (BO) 



l + 2To& uo + 2k uo + k 2 + tok 

Now, by the update rale (62), at each iteration k, we only either update /3f or jS, ■ Hence, it implies that: 

/3f=(l-To)(l-T 2 )---(l-T 2Lt . /2J )/3j\ 

j3|=(l-T 1 )(l-T 3 )---(l-T 2Lt/2J _ 1 )ft°- 

where |_jtj is the largest integer number which is less than or equal to the positive real number x. On the other 
hand, since T, + i < T, for i > 0, for any / > 0, it implies: 



(82) 



(i - T )nEo(i - ft) < [(i - *b)(i - n) ■ ■ ■ (i - r 2 ,)] 2 < nS'(i - n). 

and nrio'C 1 - *0 < K 1 - T i)0 -%)-(!- t 2 /-i)] 2 < (1 - To)" 1 1^(1 - T,). 
Note that nLnO - T i) = ^r^?, it follows from (81] and <82) for fc > 1 that: 

- '— L T k+l < /3f +l < -! Ut_i, and ^ ut+j < < ^t w . 

To To To To 

By combining these inequalities and 1801 . and noting that To 6 (0, 1), we obtain )631 . □ 
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